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Abstract 

Background: Stochastic modeling and simulation provide powerful predictive methods for the intrinsic 
understanding of fundamental mechanisms in complex biochemical networks. Typically, such mathematical models 
involve networks of coupled jump stochastic processes with a large number of parameters that need to be suitably 
calibrated against experimental data. In this direction, the parameter sensitivity analysis of reaction networks is an 
essential mathematical and computational tool, yielding information regarding the robustness and the identifiability 
of model parameters. However, existing sensitivity analysis approaches such as variants of the finite difference 
method can have an overwhelming computational cost in models with a high-dimensional parameter space. 

Results: We develop a sensitivity analysis methodology suitable for complex stochastic reaction networks with a 
large number of parameters. The proposed approach is based on Information Theory methods and relies on the 
quantification of information loss due to parameter perturbations between time-series distributions. For this reason, 
we need to work on path-space, i.e., the set consisting of all stochastic trajectories, hence the proposed approach is 
referred to as "pathwise". The pathwise sensitivity analysis method is realized by employing the rigorously-derived 
Relative Entropy Rate, which is directly computable from the propensity functions. A key aspect of the method is that 
an associated pathwise Fisher Information Matrix (FIM) is defined, which in turn constitutes a gradient-free approach 
to quantifying parameter sensitivities. The structure of the FIM turns out to be block-diagonal, revealing hidden 
parameter dependencies and sensitivities in reaction networks. 

Conclusions: As a gradient-free method, the proposed sensitivity analysis provides a significant advantage when 
dealing with complex stochastic systems with a large number of parameters. In addition, the knowledge of the 
structure of the FIM can allow to efficiently address questions on parameter identifiability, estimation and robustness. 
The proposed method is tested and validated on three biochemical systems, namely: (a) a protein 
production/degradation model where explicit solutions are available, permitting a careful assessment of the method, 
(b) the p53 reaction network where quasi-steady stochastic oscillations of the concentrations are observed, and for 
which continuum approximations (e.g. mean field, stochastic Langevin, etc.) break down due to persistent oscillations 
between high and low populations, and (c) an Epidermal Growth Factor Receptor model which is an example of a 
high-dimensional stochastic reaction network with more than 200 reactions and a corresponding number of 
parameters. 
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Background 

The need of an intrinsic understanding of the inter- 
play between complexity and robustness of biological 
processes and their corresponding design principles is 
well-documented, see for instance [1-5]. The concept of 
robustness can be described as "a property that allows 
a system to maintain its functions against internal and 
external perturbations" [3]. When referring to mathemat- 
ical models of complex biological processes, one of the 
mathematical tools to describe the robustness of a sys- 
tem to perturbations is sensitivity analysis which attempts 
to determine which parameter directions (or their com- 
binations) are the most/least sensitive to perturbations 
and uncertainty, or to errors resulting from experimen- 
tal parameter estimation. Recently there has been sig- 
nificant progress in developing sensitivity analysis tools 
for low-dimensional stochastic processes, modeling well- 
mixed chemical reactions and biological networks. Some 
of the mathematical tools included log-likelihood meth- 
ods and Girsanov transformations [6-8], polynomial chaos 
[9], finite difference methods and their variants [10,11] 
and pathwise sensitivity methods [12]. However, existing 
sensitivity analysis approaches can have an overwhelm- 
ing computational cost, either due to high variance in the 
gradient estimators, or in models with a high-dimensional 
parameter space, [13]. 

The aforementioned methods focus on the sensitivity of 
stochastic trajectories and corresponding averages. How- 
ever, it is often the case that we are interested in the 
sensitivity of probability density functions (PDF), which 
are typically non-Gaussian in nonlinear and/or discrete 
systems. In that latter direction, there is a broad recent 
literature relying on information theory tools, and where 
sensitivity is estimated by using the Relative Entropy and 
the Fisher Information Matrix between PDFs, providing a 
quantification of information loss along different param- 
eter perturbations. We refer to [14-18] for the case when 
the parametric PDF is explicitly known. For instance, 
in [16] the parametric PDFs structure is known as it 
is obtained through an entropy maximization subject to 
constraints. Knowing the form of the PDF allows to carry 
out calculations such as estimating the relative entropy 
and identifying the most sensitive parameter combina- 
tions. Furthermore, the pathwise PDFs are also known in 
reaction networks when a Linear Noise Approximation 
(LNA) is employed and for this case the relative entropy 
can be explicitly computed allowing thus to carry out 
parametric sensitivity analysis, [18]. However, for complex 
stochastic dynamics of large reaction networks, spatial 
Kinetic Monte Carlo algorithms and molecular dynamics, 
such explicit formulas for the PDFs are not available in 
general. 

In [19], we address such challenges by introducing a new 
methodology for complex stochastic dynamics based on 



the Relative Entropy Rate (RER) which provides a mea- 
sure of the sensitivity of the entire time-series distribution. 
Typically, the space of all such time-series is referred in 
probability theory as the "path space". RER measures 
the loss of information per unit time in path space after 
an arbitrary perturbation of parameter combinations. RER 
and the corresponding Fisher Information Matrix (FIM) 
become computationally feasible as they admit explicit 
formulas which depend only on the propensity functions 
(see (4) and (6), respectively). In fact, we showed in [19] 
that the proposed pathwise approach to sensitivity analy- 
sis has the following features: First, it is rigorously valid for 
the sensitivity of long-time, stationary dynamics in path 
space, including for example bistable, periodic and pulse- 
like dynamics. Second, it is a gradient-free sensitivity 
analysis method suitable for high-dimensional parameter 
spaces as the ones typically arising in complex biochemi- 
cal networks. Third, the RER method does not require the 
explicit knowledge of the equilibrium PDFs, relying only 
on information for local dynamics and thus making it suit- 
able for non-equilibrium steady state systems. In [19], we 
demonstrated these features by focusing on two classes of 
problems: Langevin particle systems with either reversible 
(gradient) or non-reversible (non-gradient) forcing, high- 
lighting the ability of the method to carry out sensi- 
tivity analysis in non-equilibrium systems; and spatially 
extended Kinetic Monte Carlo models, showing that the 
method can handle high-dimensional problems. 

In this paper, we extend and apply the pathwise sen- 
sitivity analysis method in [19] to biochemical reaction 
networks, and demonstrate the intrinsic sensitivity struc- 
ture of the network. Such systems are typically mod- 
eled as jump Markov processes and they are simulated 
using either exact algorithms such as the Stochastic Sim- 
ulation Algorithm (SSA), [20-22] and the next-reaction 
method [23], or by employing approximations such as 
mean field ODEs, tau-leap [24] and stochastic Langevin 
methods [25]. 

We show that the proposed pathwise method allows 
us to discover the intrinsic sensitivities of the reaction 
network by decomposing the FIM into diagonal blocks. 
The block-diagonal structure of the proposed FIM reveals, 
in a straightforward way, the sensitivity interdependen- 
cies between the system parameters. For instance, if each 
propensity function depends only on one parameter - 
usually the reaction constant- then the FIM is a diagonal 
matrix (see (14)). The sparse representation of the FIM 
can be essential in optimal experimental design as well 
as in parameter identifiability and robustness where each 
subset of the parameters defined by a block of the FIM 
can be treated separately. Moreover, our earlier rigorous 
analysis [19] for the stationary regime suggests suitable 
extensions in the transient case which are here tested 
and validated. Finally, we present strategies for efficiently 



Pantazis etal. BMC Bioinformatics 201 3, 14:31 1 
http://www.biomedcentral.eom/1 471 -21 05/1 4/31 1 



Page 3 of 19 



and reliably implementing the proposed method for high- 
dimensional, complex stochastic systems using an array 
of existing accelerated versions of the SSA algorithm such 
as mean field, stochastic Langevin, r-leap approximations 
and their variants, [21,24-27]. 

We test the proposed set of methods and computational 
strategies in three examples of biochemical networks. 
First, we consider a prototypical protein produc- 
tion/degradation model, i.e, a single-species birth/death 
model, with explicitly known formulas for the stationary 
and the time-dependent distribution. This model serves as 
a benchmark where the differences between the proposed 
pathwise FIM and the stationary FIM are highlighted. 
Second, we study the parameter sensitivities of a p53 gene 
model for cell cycle regulation and response to DNA dam- 
age, that incorporates the feedback between the tumor 
suppressor p53 gene and the oncogene Mdm2 [28]. This 
is a reaction network that exhibits random oscillations in 
its steady state, and for which continuum approximations 
of the SSA such as LNA break down due to persistent 
oscillations between high and low populations. Using 
the proposed method, we also study a far more complex 
network, the epidermal growth factor receptor (EGFR) 
model, describing signaling processes between mam- 
malian cells [29-31]. This is a high-dimensional system 
both in the number of variables and parameters, includ- 
ing 94 species and 207 reactions. Having a gradient-free 
method for this example with parameter space of dimen- 
sion 207 provides a significant advantage over gradient 
methods such as finite differencing, where the computa- 
tion of a very high number of partial derivatives and/or 
directional derivatives is needed and with possibly sig- 
nificant variance that scales with the dimension, [11]. 
By contrast, the eigenvalue/eigenvector analysis of the 
proposed FIM identifies the order from least to most sen- 
sitive directions (determined by the eigenvectors of the 
FIM) by the corresponding eigenvalues. 

In Methods, we present the derivation of the Relative 
Entropy Rate and its corresponding Fisher Information 
Matrix for continuous-time jump Markov processes as 
well as we reveal the block-diagonal structure of the FIM 
for commonly encountered reaction networks, continued 
by the presentation of both unbiased and biased -but 
accelerated- statistical estimators for RER and FIM. Then, 
in the Results, we apply and validate the proposed path- 
wise sensitivity analysis methodology in three complex 
biological reaction networks. 

Methods 

We consider a well-mixed reaction network with N spe- 
cies, S = {Si, . . . ,Stv}, and M reactions, R = {R\, . . . ,Rm}- 
The state of the system at any time t > 0 is denoted by an 
Af-dimensional vector X(£) = [X\(t), . . . f X^(t)] T where 
Xi(t) is the number of molecules of species S; at time t 



Let the Af-dimensional vector vj correspond to the stoi- 
chiometry vector of y-th reaction such that v,y is the sto- 
ichiometric coefficient of species S; in reaction Rj. Given 
that the reaction network at time t is in state X(£) = x, the 
propensity function, <zy(x), is defined so that the infinites- 
imal quantity aj(x)dt gives the transition probability of 
the y-th reaction to occur in the time interval [t,t + dt\. 
Propensities are typically dependent on the state of the 
system and the reaction conditions (i.e., external param- 
eters) of the network such as temperature, pressure, etc. 
Mathematically, {X(t)}teR+ is a continuous-time, time- 
homogeneous, jump Markov process with countable state 
space E c N^. The transition rates of the Markov pro- 
cess are the propensity functions «/(•)> j = 1, . . . ,M. The 
transition rates determine the clock of the updates (or 
jumps) from a current state x to a new (random) state x r 
through the total rate <Zo( x ) := YlpLi a j( x ) while the tran- 
sition probabilities of the process are determined by the 
ratio l^jj. We refer to Algorithm 1 for the details of the 
stochastic simulation. 

Relative entropy 

Assume that two probability distributions (or more gen- 
erally probability measures) V and V have corresponding 
probability densities p = p(x) and p = p(x). Then, the 
Relative Entropy or Kullback-Leibler divergence of V with 
respect to V is defined as [32,33] 

n{p\v)-.= ^pix)\o % (^d x . (i) 

In a more general setting, relative entropy is defined 
as K(V\V) := /log(^) dV where g is a func- 
tion known as Radon-Nikodym derivative while the 
integration is performed with respect to the prob- 
ability measure V, [34]. A necessary condition for 
the relative entropy to be well-defined is that the 
Radon-Nikodym derivative exists which is satisfied when 
V is absolutely continuous with respect to V. Rel- 
ative entropy has been utilized in a diverse range 
of scientific fields from statistical mechanics [34] 
to coding in telecommunications (information theory) 
[33] and finance [35], and it possesses the following three 
fundamental properties: 

(i) it is always non-negative, 

(ii) it equals to zero if and only if V = V P-almost 
everywhere, and, 

an) n(v\v) < oo if and only if V and V are absolutely 
continuous with respect to each other. 

From an information theory perspective, relative 
entropy quantifies the loss of information when V is uti- 
lized instead of V, [33]. In other words, relative entropy 
quantifies the inefficiency of assuming an incorrect or 
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perturbed distribution V instead of employing the true 
distribution V. Therefore, even though not a metric, rel- 
ative entropy has been used as a suitable quantity for the 
assessment of parametric sensitivities since the higher the 
relative entropy (i.e., the information loss) in some per- 
turbed direction, the larger the sensitivity should be in this 
direction. 

Pathwise relative entropy and relative entropy rate 

Proceeding to the pathwise formulation of the relative 
entropy, we assume that the propensities depend on a 
parameter vector 0 e M. K (i.e., aj(x) = a 6 - (x)) while 
the continuous-time jump Markov process {X(t)} teR+ 
lies in the stationary regime. We denote by fi° (x) the 
steady state (or stationary) distribution of the stochas- 
tic process X(£). The stationary path distribution of the 
process in the interval [0, T] is denoted by Q® Q T ^ . Notice 
that path distributions (i.e., time-series distributions) are 
high-dimensional complex objects; for instance, if we con- 
sider the simpler discrete-time Markov chain, {Z n } ne z + , 
defined by the transition probability density p(z, z f ), then, 
utilizing repeatedly the Markov property, the station- 
ary path distribution of the time-series (zn, zi, . . . , zj) is 
given by 

Q[ 0fT] ({Z n = z n } 0 <n<T) = Prob(z 0 , . . . ,z T ) 

= [i(zo)p(z 0 , zi) . . .^(z r _i, z r ). 

Proceeding, we consider another continuous-time jump 
Markov process {X(t)} te R + defined by perturbing the 
propensity functions by a small vector e e M. K . The corre- 
sponding steady state and path distributions of {X(t)} te ^ + 
are denoted by /x^ +6 (x) and Q°^y res P ec tively. Let the 
two path distributions Q® 0 ^ and Q^x] De absolutely con- 
tinuous with respect to each other which is satisfied when 
a 9 , (x) = 0 if and only if a e + e (x) = 0 holds for all x e E and 
j = 1, . . . ,M. Then, the Relative Entropy of the path dis- 
tribution Q 9 Q r j with respect to Q 0 ^] ls defined similarly 
to (1) as 



K(Q e [0 , T] \Q e ^ 



dQ 



[0,71 



dQ 



[0,71 



dQ 



[0,T]> 



(2) 



where ff e ] is the Radon-Nikodym derivative of Q? 0 T] 

with respect to Q°^y * n ^ act ' usm g the Girsanovs 
formula, we can obtain an explicit expression for the 
Radon-Nikodym derivative in terms of the propensities, 
[34]. In the context of sensitivity analysis, the path- 
wise relative entropy 1Z (q® 0 t ^ | Q d [Qx]) ls a measure of 
information loss due to an 6 -perturbation of the model 
parameters, and consequently it is a natural measure of 
parametric sensitivity. 



Moreover, in the stationary regime, relative entropy 
increases linearly in time, hence the Relative Entropy Rate 
(RER) which is the time average of the relative entropy, 



n (<? I Q* +e ) := Urn \ll (Qf ar] I Qfe) . 



(3) 



is a well-defined quantity, [36]. As first proposed in [19], 
T~L (Q 6 * IQ 6>+6 ) is a suitable time-independent measure of 
sensitivity: it measures the rate of the loss of informa- 
tion due to an 6 -perturbation of the model parameters, in 
the long-time, stationary dynamics regime of the stochas- 
tic process. Furthermore, RER admits an explicit formula 
given by (see Additional file 1 for a rigorous derivation) 



H (Q IQ ) = E^e 



" e «?(x) 

^ (x)log ^ 



/'=1 



- (ag(x)-flg +€ (x)) 



(4) 



Thus, from a practical point of view, RER is an observ- 
able of the stochastic process which can be computed 
numerically as an ergodic average, requiring only the 
knowledge of the propensity functions and the stoichio- 
metric matrix (v)y. Nevertheless, in order to carry out the 
sensitivity analysis in the parameter vector 6, the compu- 
tation of RER for different 6S is necessary which can be 
computationally challenging for high-dimensional param- 
eter spaces. Thus, a sensitivity analysis methodology 
which does not depend on es -such methods are called 
"gradient-free"- is desirable and is developed next. 

Pathwise Fisher information matrix 

Even though not directly evident from (4), a Taylor series 
expansion of RER in terms of e reveals that RER is locally 
a quadratic function of the parameter vector 6 e M. K . 
Indeed, RER is non-negative when e ^ 0 and equals to 
zero when 6=0 thus the linear term in the Taylor expan- 
sion is zero. Therefore, RER is written -under smoothness 
assumptions on the propensity functions with respect to 
the parameter vector 0- as [19], 



n (Q 9 \Q e+e ) = -€ T ¥ n (Q e )€ + 0(|€| 3 ), 



(5) 



where F^(Q^) is a K x K matrix that can be consid- 
ered as a pathwise analogue for the steady state Fisher 
Information Matrix (FIM). Similarly to the steady state 
FIM for parametrized distributions [33], F^(Q^) is the 
Hessian of the RER which geometrically corresponds to 
the curvature around the minimum value of the rela- 
tive entropy rate. The pathwise FIM contains up to third 
order accuracy all the sensitivity information for the path 
distribution at point 0 for any perturbation direction e, 
therefore, the computation of the FIM is sufficient up to 
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third order for the evaluation of all the local sensitivi- 
ties of the path distribution around the parameter vector 
0. Moreover, an explicit formula for the pathwise FIM is 
given by (see Additional file 1 for a derivation) 



multiplication (i.e., (a.b\ = a^b^, k = 1, . 
obtain the logarithmically-scaled FIM: 



.,K), we 



M 



J2*j(x)V 0 log^(x)V, log^(x) 3 

7=1 



(6) 



The implications of this explicit formula are twofold. 
First, it reveals that for many typical reaction networks the 
FIM has a special block-diagonal structure which reflects 
the parameter interdependencies and it is discussed in 
detail below. Second, the FIM is based on the propensity 
functions as well as on their derivatives which are known - 
actually, they define the process- thus the FIM, similarly 
to RER, is numerically computable as an observable of 
the process. Subsequent sections present various strate- 
gies to numerically estimate both the RER and the FIM in 
an efficient fashion . 

Furthermore, the pathwise FIM, F^(Q^), can be used 
not only for the estimation/approximation of RER via (5) 
but also to infer intrinsic knowledge for the systems sen- 
sitivities [16,37]. In general, the spectral analysis of the 
FIM reveals the (local) comparatively most/least sensitive 
directions of the system around 0. Indeed, by ordering the 
eigenvalues of the FIM as 

A.? > • • • > k° K > 0, 

it can be inferred that the most sensitive direction cor- 
responds to the eigenvector with eigenvalue X 6 X while the 
least sensitive direction corresponds to the eigenvector 
with eigenvalue X e K . Additionally, the FIM is one of the 
most useful tools for optimal experimental design. Many 
of the optimality criteria such as D-optimality where the 
determinant of the FIM is maximized or A-optimality 
where the trace of the inverse of the FIM is minimized are 
based on FIM, [37]. 

In the same direction, robustness of the system to 
parameter perturbations or errors as well as parameter 
identifiability can be studied utilizing spectral analysis of 
the FIM. For instance, parameter identifiability is satis- 
fied when all the eigenvalues of the FIM are above a given 
threshold, [18]. 

Sensitivity analysis at the logarithmic scale 

In many biochemical reaction networks, the model 
parameters differ by orders of magnitude and a reasonable 
option for carrying out sensitivity analysis is to perform 
perturbations which are proportional to the parameter 
magnitude. This can be carried out by perturbing the log- 
arithm of the model parameters instead of the parameters 
itself. Using the chain rule V\ o%e f(0) = V e f(0)S7\ oge 0 = 
6.Vof(6) where V is defined as the element by element 



(¥ H (Q^ e )) k =e k e l {¥ H {Q e )) u , k,i = i,...,i<, 

(7) 

where ¥ H (Q e ) is given by (6). Similarly, the logarithmic 
perturbation for the RER is carried out using the pertur- 
bation vector O.e instead of e. Notice that (5) continues to 
be valid for the logarithmic scale, i.e., 

H (Q d \Q d+d - e ) = ±e T F n (Q l °z d )e + 0(|e| 3 ) . (8) 

Linking relative entropy and observables 

As we discussed in the previous sections, relative entropy 
provides a mathematically elegant and computationally 
tractable methodology for the parameter sensitivity analy- 
sis of complex, stochastic dynamical systems. Such results 
focus on the sensitivity of the entire probability distribu- 
tion, either at equilibrium or at the path-space level, i.e., 
for the entire stationary time-series. However, in many 
situations of chemical and biological networks, the inter- 
est is focused on observables such as mean populations, 
population correlations, population variance as well as 
path-space observables such as time autocorrelations and 
extinction times. Therefore, it is plausible to attempt to 
connect the parameter sensitivities of observables to the 
relative entropy methods proposed here. Indeed, relative 
entropy can provide an upper bound for a large family 
of observable functions through the Pinsker (or Csiszar- 
Kullback-Pinsker) inequality, [33]. 

More precisely, for any bounded observable function f 
the Pinsker inequality states that 

IV If] - E Q e + , [f]\<\ f\ loo^^lQ^), (9) 

where || • | |oo denotes the supremum (here, maximum) 
of / An obvious outcome of this inequality is that if the 
(pseudo-)distance between two distributions defined by 
1Z (Q 6 * I Q^ +6 ) is controlled, then the error between the two 
distributions is also controlled for any bounded observ- 
able. In the context of sensitivity analysis, inequality (9) 
states that if the relative entropy is small, i.e., insensitive 
in a particular parameter direction, then, any bounded 
observable / is also expected to be insensitive towards 
the same direction. In this sense, (9) can be viewed as a 
"conservative" -but not necessarily sharp- bound for the 
parametric sensitivity analysis of observables, including 
path-dependent observables such as long-time averages 
and autocorrelations. 

From a practical perspective, (9) can be used as an 
indicator that suggests -even in the presence of a very 
high-dimensional parameter space- which are the insen- 
sitive parameter directions for observables of stochastic 
dynamical systems. The least-sensitive directions can be 
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verified computationally and we present in the sequel 
two examples of this practical strategy in the p53 and 
the EGFR models. More generally, determining insensitive 
directions in parameter space can by particularly useful in 
multi-parameter models in systems biology characterized 
by "sloppiness" , i.e., when most model parameters allow 
for a vast range of perturbations without affecting the 
dynamics, [38]. As we concretely show in the EGFR exam- 
ple, our methodology can easily demonstrate and quantify 
such properties in stochastic dynamics through the use 
of the spectral analysis of the pathwise FIM, even if the 
models include a very large number of parameters. 

Remark L We note that in order to carry out such 
an analysis in a mathematically rigorous manner which 
includes both sensitive and insensitive parameters, we 
need to require that the norm || • \ \oo in (9) is controlled. 
For instance, typical observables in biochemical reaction 
networks are the number of molecules for each species, 
hence f(x) = x. Thus, for reaction networks where the 
population size is large, the Pinsker inequality (9) will 
provide a bound that may not be sharp. In fact, it was 
recently shown that there are alternative bounds which 
are expected to be practically more useful than (9) in the 
sense that they provide sharp upper bounds for observ- 
ables in terms of the relative entropy, [39]. These sharp 
bounds rely on the variational representation of the rel- 
ative entropy and the existence of an explicit minimizer 
for the upper bound. Furthermore, [39] combined with 
our work on RER and pathwise FIM suggests new mathe- 
matical questions towards pursuing practical sharp upper 
bounds involving RER and pathwise FIM. In view of these 
comments, we conclude that (9), (a) constitutes a the- 
oretical indicator that relative entropy is a reliable tool 
for sensitivity analysis and more generally for uncertainty 
quantification; (b) from a practical perspective, it is capa- 
ble to rule out insensitive directions in parameter space, 
which in turn provides a significant advantage in the study 
of "sloppy" multi-parameter models. 

Block-diagonal structure of the pathwise FIM 

In chemical reaction networks, reactions typically depend 
only on a small subset of the parameter vector. Mathemat- 
ically, this is described as 

a°(x)=aj(x;6 kl ,. (10) 

where ki, . . . , ki- e {1,...,K} while Lj K is the num- 
ber of involved parameters in reaction Rj. Using (6), it can 
be shown that this parametric dependence of the propen- 
sities is directly reflected on the pathwise FIM. Indeed, 
after grouping the reactions into subsets in such a way 
that each subset contains the minimum number of reac- 
tions having common parameters, the pathwise FIM - 
upon rearrangement of the parameter vector- becomes 



a block-diagonal matrix. The pathwise FIM is then 
written as 



*H(Q loge ) = 



A{ 0 
LO A? J 



(11) 



where A^ t ... t Aj are block matrices. The block matrices 
which are defined by the reaction subsets with the same 
parametric dependence are easily obtained by creating a 
graph whose nodes are the reactions and the parameters 
while the edges are their dependences. Then, the param- 
eter nodes contained in a connected subgraph define a 
parameter subset which in turn corresponds to a block 
of the FIM. An illustration of this procedure is shown in 
Figure 1 where a reaction network with M = 9 reactions 
and K = 7 parameters is plotted. The parametric depen- 
dencies of the reactions are shown in the left panel where 
4 subgroups of parameters are defined based on the graph 
connectivity. The resulting block-diagonal structure of the 
FIM is shown on the right panel of Figure 1. 

Before proceeding with the theoretical computation of 
the FIM for various well-known classes of biochemical 
reaction networks, we list some of the implications of this 
simplified structure of the FIM in sensitivity analysis and 
elsewhere. 

(i) The sparsity of the FIM is proportional to the 
parametric decoupling between the reactions. 
Knowing a priori the zero elements of the FIM, there 
is no need to numerically compute them. It is clear 
that the computation cost for each sample drops 
from 0(K 2 ) to 0(KL) where L is the largest 
dimension of the block matrices. 

(ii) The inverse of the FIM is also block-diagonal and 
each block of the inverse FIM is the inverse of the 
respective block. This fact allows us in the parameter 
estimation problem to easily evaluate the lower 
bound of the variance, at least for the complete-data 
case [40], i.e., obtain Cramer-Rao bounds, [41,42] 
which are given by the diagonal elements of the 
inverse of the FIM. 

(iii) Relation (11) implies that that optimality criteria in 
optimal experiment design, [37], are significantly 
simplified. For example, the determinant employed in 
the D-optimality test is given by the relation 
det(F^) = nj=i det(A,-), while the trace of the 
inverse of the FIM utilized in the A-optimality 
reduces to tr(F^ _1 ) = Y!i=i te(AJ l ). 

(iv) Given that parametric identifiability is characterized 
by the magnitude of the eigenvalues of the FIM, e.g. a 
zero eigenvalue corresponds to a non-identifiable 
direction in parameter space, [18,43], then the 
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Reactions Parameters Fisher Information Matrix 




Figure 1 The graph representation (left panel) of the dependencies between the reactions (left column) and the model parameters (right 
column) as well as the corresponding block-diagonal structure of the FIM (right panel). The grouping of the parameters is carried out by 
noting the connected parts of the graph. In this example K = 7 while the largest dimension of the blocks is L = 3. 



block-diagonal structure (11) can provide additional 
insights in parameter identification. For instance, the 
identifiability of the parameters of the group that 
corresponds to the i-th block, A» will be increased if 
the smaller eigenvalues of the i-th block can be 
increased. On the other hand, if the determinant of 
the i-th block equals to zero then at least one of its 
eigenvalues is zero and thus the corresponding linear 
combinations of parameters are non-identifiable. 
Similarly, the robustness of the system to 
perturbations of the parameters of the i-th group will 
be increased if there is a way to decrease the larger 
eigenvalues of the i-th block. 

Overall, we note that extracting useful information regar- 
ding model parameters can be performed for each block 
of the pathwise FIM independently. Next, we discuss 
two specific examples of biochemical reaction networks 
where the explicit calculation of the block-diagonal FIM is 
demonstrated. 

Reactions with independent reaction constants (mass action 
kinetics) 

An important class of well-mixed reaction networks take 

Qj 

the general form "otjAj + fyBj —> ..." where Aj and Bj 
are the reactant species while ay and are the respec- 
tive number of molecules needed for the reaction. The 
reaction constant, Oj, is the parameter of the y-th reaction. 
The propensity function for the y-th reaction is given as 
the product between a rate constant and a function of the 
current state x: 



Typically, gj(x) = p.) which stems from the law 

of mass action, however, it can take different forms 
depending on the modeling of the physical process. 
This reaction network has K = M parameters, while 
each propensity depends only on one parameter, i.e., 
Lj = l in (10) for ; = 1,...,M. The (/c,/)-th element 
of the FIM in the logarithmic scale is explicitlyl 
given by 

(f h (q 1o ^)) h = ^ 



x V 



Y^^(x)de k \og^(x)d 0l log^(x) 7 

7=1 



(13) 

where fi 0 is the stationary distribution of the stochastic 
process. Furthermore, it holds that d$ k log<2? (x) = ^ <$*(/) 
where <$(•) is the Dirac function, therefore the pathwise 
FIM is a diagonal matrix with elements given by 



(f„(Q^)) w = 



[a e k (x)] ,l = k 
0, l^k 



(14) 



fly(x) = 0jgj(x), j=l,...,M. 



(12) 



This result demonstrates that the sensitivity of a reac- 
tion constant is proportional to the equilibrium aver- 
age of the respective propensity function. Moreover, 
due to the diagonal form of the FIM, it is straightfor- 
ward to carry out the eigenvalue analysis and infer the 
most/least sensitive directions of the reaction network: 
the eigenvalues of the FIM are its diagonal elements while 
the eigenvectors are the standard basis vectors of R K . 
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Hence, the most (respectively least) sensitive parame- 
ter is obtained from the largest (respectively smallest) 
diagonal element of the FIM. Furthermore, (14) demon- 
strates that the (local) robustness of the reaction net- 
work to a specific parameter is inversely proportional 
to the mean propensity of the corresponding reaction. 
Another observation stemming from the diagonal struc- 
ture of the pathwise FIM is that each rate constant 
can be estimated from time-series data independently 
from the other rate constants. This observation has been 
already pointed out and discussed in the context of max- 
imum likelihood estimation for the complete-data case 
([40], Sec. 10.2). 

Additionally, the mean firing rate of a reaction is equal 
to the mean propensity. Hence, it can be stated that the 
parameters that correspond to the faster reactions, i.e., to 
reactions with larger mean firing rate, are more sensitive 
in a pathwise entropy sense. It should be noted, however, 
that not all observables are sensitive to the parameters 
that correspond to the faster reactions and there are 
examples (see the protein production-degradation model 
in the Results section) where steady state observables 
such as the equilibrium distribution remain insensitive to 
specific perturbation directions even though their mean 
propensity may be increased. Finally, we would like to 
remark that even though E^e [^(x)] = O^E^e [<gf(x)] 
trivially holds true, the diagonal elements of the FIM 
are not linear functions of the corresponding reaction 
constants since the steady state distribution fi e , depends 
also on the parameter vector 0. In fact, high reaction 
constants do not necessarily imply large mean propensi- 
ties and hence a more sensitive parametric dependence. 
This is specifically due to the mean value in (14) and 
as an illustrative example we refer to the simple pro- 
tein production-degradation model (e.g., compare (24) 
and (27)). 

Michaelis-Menten kinetics 

Another class of reaction networks is the Michaelis- 
Menten kinetics. In its simplest form (e.g., single- 
substrate reaction without intermediate), this chemical 
network contains a reaction between species A and B (i.e., 
A —> B) with propensity function given by 



4(x) = 



Ok' + x A ' 



This reaction sub-network which is derived under 
a quasi-steady-state assumption is one of the best- 
known models of enzyme kinetics in biochemistry [44]. 
The parameter % (usually denoted by V max ) repre- 
sents the maximum rate achieved by the system, at 
maximum (saturating) substrate concentrations while 



the Michaelis constant 6^ (usually denoted by K m ) is 
the substrate concentration at which the reaction rate 
is half the maximum value. The propensities of this 
Michaelis-Menten sub-network depend on two parame- 
ters (Lk = 2 in (10)) thus the corresponding FIM block 
is a 2 x 2 matrix. The elements of the FIM matrix are 
given by 



(f«(Q*')) w = 



V Kw] • 



-0*/E„ 



0, 



l = k 
, l = k' 



(15) 



for the /c-th row while the /c'-th row is given by 



(f w (^)) w 



-6 k ,E u 



<(x) 

' 4(x) ' 

e k r+x A 

o, 



l = k f 



l = k 



(16) 



In general, biochemical reaction networks may have sig- 
nificantly more complex propensities, nevertheless, the 
computation of the FIM follows exactly the same calcula- 
tion lines for any propensity function. 

Strategies for the statistical estimation of RER and FIM 

Previous sections introduced and justified RER and FIM 
as appropriate observables for measuring the sensitivity 
analysis of the reaction networks parameters in long-time 
dynamics. This section presents strategies on how to effi- 
ciently estimate these quantities as ergodic averages of the 
underlying stochastic process. 

Unbiased statistical estimators 

Since the stationary distribution, /jl 6 , is usually not known, 
both FIM and RER should be estimated numerically as 
ergodic averages. Indeed, the statistical ergodic estimator 
for RER is given by 



n-l 

7^) = ij>; 



i=0 



M 

E 

;'=i 



dj(xi) log 



(17) 



- (al(xi)-al +e (xi)) 
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where At[ is an exponential random variable with param- 
eter given by the total rate, <2q(x;), while T = Y%=i 
is the total simulation time. The sequence {x/}" =0 is 
the embedded Markov chain with transition probabil- 
ities from state x; to state x/+i is given by the ratio 

a°- (xi) 

j- . The weight Ati, which is the waiting time at 

state x/, is necessary for the unbiased estimation of the 
observable, [45] . Similarly, the unbiased estimator for the 
FIM is 

^ n-l M 

f u = t ^ At iJ2 a J^ Vd lo 8*/ (^V* log^(x;) r 

i=0 ;'=1 

(18) 

Noticing that the computation of the local propensity 
functions a e - (x/) for all/ = 1, ... , M is needed for the sim- 
ulation of the jump Markov process when Monte Carlo 
methods such as SSA [45] is utilized, the computation of 
the perturbed transition rates is the only additional com- 
putational cost for the numerical RER while the additional 
cost for the estimation of the FIM is the computation 
of the derivatives of the propensities. Algorithm 1 sum- 
marizes the numerical computation of RER and FIM, 
employing the SSA for the simulation of the jump Markov 
process. 

Accelerated statistical estimators 

A typical feature of biochemical systems is that the mod- 
eled reaction network is large with hundreds or thou- 
sands of reactions and different time scales stemming 
from the orders of magnitude difference between the 
reaction rates and/or between the species concentra- 
tions, making the SSA extremely slow. A large num- 
ber of multi-scale approximations of the original SSA 
have been developed in order to handle such issues 
resulting to accelerated simulation algorithms. For exam- 
ple, mean-field approximation ignores the fluctuations 
of the stochastic process and yields a deterministic sys- 
tem of ordinary differential equations (ODE) for the 
mean population of the species [46,47]. Stochastic cor- 
rections to the mean-field model such as stochastic 
Langevin [25] and linear noise approximation [48] can 
be applied in order to improve the accuracy of the 
simulation. An alternative approximation of the jump 
Markov process is the tau-leap method proposed by 
Gillespie [24] where a batch of events occurs at each 
time-increment, r. Several improvements of the basic 
tau-leap algorithm on how to select adaptively the 
r [49] or avoiding negative populations [27,50] have 
been proposed, however, their performance is heavily 
model-dependent. 



Algorithm 1 SSA-based numerical computation of RER 
and FIM 

1. Initialize: x = x 0 , T = 0, U = 0 and F = 0. 

2. FOR / = 1, . . . , n 

(a) Compute: j^J (x) J ^ a e Q (x). Compute also 
|^ +6 (x) J M (only for RER) and 

I V e log a 6 j (x) | i (only for FIM). 

(b) At = - log(Ki)/*g(x) where u\ ~ W(0, 1). 

(c) Update time: T = T + At 

(d) Update RER: U = U + At [j^fii */ (x)log 

x ^-(^o«--o +6 «)]- 

(e) Update FIM: F = F + At Y!^ x rf(x)V e log a® 
(x)V,log^(x) r 

(f) Find /* such that Y^!j=i a °j ( x ) < ^2^W 

< YljLj* */ 00 where u 2 - W(0, 1). 

(g) Update state: x = x + v } * . 

3. Normalize: ti = ti/T and F = T/T. 



In this subsection, we propose such approximations in 
order to efficiently compute the FIM and/or RER observ- 
ables, while maintaining controlled bias in the statistical 
estimators. As an illustration, we present the well-known 
mean-field approximation. The popularity of the mean- 
field modeling stems from their computational efficiency. 
To proceed, the stochastic process can be written as 

X(f)=*(f) + i£(f) (19) 

where x(t) is the deterministic part (mean) of the pro- 
cess, § (t) is the stochastic zero-mean part while r\ is the 
amplitude of the stochastic term. The amplitude of the 
stochastic term is proportional to the inverse square root 
of the reactant populations [25,48,51]. Thus, for large 
populations, the fluctuations of the time-evolving species 
populations become vanishingly small compared to the 
deterministic contributions. Consequently, the dominant 
part of the process is the deterministic term whose 
dynamics are governed by the ODE system 

M 

Xi(t) = v M a j (*(*)) > i=l,---,N. (20) 

7—1 

This ODE system is also known as reaction rate 
equations [25]. Restricted for simplicity to the special 
case with independent rate constants for each reaction, 



Pantazis etal. BMC Bioinformatics 201 3, 14:31 1 
http://www.biomedcentral.com/1471-2105/14/31 1 



Page 10 of 19 



the diagonal elements of the FIM are approximated using 
(19) as 

1 w 



;=1 



i=l 



(21) 



Typically, such ODE system is solved using an adap- 
tive time-step numerical integrator up to final time T = 
X!f=o Aft. Thus, for large species populations ^> 1), 
the following numerical estimator for the FIMs diagonal 
elements is obtained: 

1 n 

(i ( u) kk =jJ2 Ati .k=l,...,K (22) 

i—1 

Relation (22) suggests an algorithm similar to Algorithm 1 
for the numerical computation of the FIM where instead 
of SSA, an ODE solver is employed. 

Remark 2. Multi-scale approximations are usually valid 
for large populations and relatively simple systems which 
do not exhibit complex dynamics such as bistability or 
intermittency. Indeed, large deviation arguments [52] 
or even explicitly available formulas for escape times 
[53] demonstrate that stochastic approximations cannot 
always capture correctly exit times, rare events, strong 
intermittency, etc. even in relatively simple systems. How- 
ever, in order to simulate large biochemical systems there 
is often no other alternative than such approximate mod- 
els, which nevertheless need to be employed with the 
necessary caution. 

Remark 3. In biochemical systems, we are interested 
not only in the steady state, i.e., the stationary distribu- 
tion or time-series, but also in the transient regime, e.g. 
signaling phenomena. The extension of the proposed sen- 
sitivity analysis method to the transient regime is justified 
by the fact that the time-normalized relative entropy can 
be also decomposed as a sum of simple integrals [33] 
which results to the fact that the statistical estimators 
(17) and (18) remain valid. In a subsequent section we 
present an example of a biochemical system (EGFR) which 
exhibits transient behavior, and where the proposed sensi- 
tivity analysis tools are tested and validated. The rigorous 
mathematical derivation of the relative entropy rate for the 
transient regime is out of the scope of this publication and 



a dedicated mathematical article on the time-dependent 
relative entropy rate will follow. 

Results and discussion 

A simple protein production/degradation model 

We first consider an elementary stochastic model for pro- 
tein production and degradation, [54], which is also a 
component of more complex models for gene regulatory 
networks, [55]. In this simplified model, the protein is pro- 
duced at a constant rate ki, while it is degraded with rate 
I<2, corresponding to the reactions 



h 

0 +±X. 
k 2 



(23) 



Accordingly, the corresponding propensity functions for 
the current state x = x are: 



a\ (x) = k\ and a<i (x) = k^x . 



(24) 



We consider this simple stochastic model due to the 
available analytic representations of the steady state 
(equilibrium) distribution, time-dependent moments and 
autocorrelations, ([46], Sec. 7.1). Consequently, we can 
both illustrate the proposed pathwise sensitivity analy- 
sis, as well as compare it to the standard equilibrium 
FIM, revealing concretely differences between the two 
approaches. 

The equilibrium distribution, fji 0 , of this simple network 
is a Poisson distribution with parameter |k Therefore, the 

equilibrium FIM for the parameter vector 0 =[/<i,/<2] r is 
given in logarithmic scale by 



k 2 



1 

-1 



(25) 



On the other hand, the pathwise FIM is computed via 
(14): 



F^(Q log *)=/<i 



where we used that 



1 0 
0 1 



(26) 



V t^iWl = V t*2(*)] = h • (27) 

The complete calculations can be found in the 
Additional file 2. Some of the implications of the differ- 
ences between these two FIMs are discussed next. 

First, we observe that the equilibrium FIM, (25), is sin- 
gular, i.e., one of the eigenvalues is zero. We readily see 
that in the parameter direction defined by the correspond- 
ing eigenvector, i.e., when the parameter ratio, ^, remains 
constant, the system is expected to be insensitive, at least 
with respect to the equilibrium distribution. Clearly, this 
is a fact verified directly from the Poisson equilibrium 
distribution fi° which depends only on the ratio. On the 
other hand, the pathwise FIM, (26), is not singular and 
all the directions are equally sensitive. This fact suggests 
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that observables for dynamic quantities may be sensi- 
tive not only to parameter ratio perturbations but also to 
other parameter perturbations. Indeed, one such exam- 
ple is the stationary autocorrelation function, which in the 
case of the simple protein production/degradation model 
is explicitly given by ([46], Sec. 7.1), 

(X t ,X 0 ) s = ^e-to (28) 

where (•, -) s denotes stationary averaging. Based on this 
formula, it is obvious that the autocorrelation function is 
also sensitive to I<2, in addiction to the ratio ^. This exam- 
ple demonstrates that in contrast to the pathwise FIM, 
the equilibrium FIM is inadequate to fully capture the 
dynamic properties of the process. Moreover, the path- 
wise FIM depends linearly only on k\, which shows that 
the reaction rate constants and propensity functions in 

(24) alone, can be misleading in the assessment of para- 
metric sensitivity. Contrary, the mathematically correct 
equilibrium averaging of the propensities, i.e., (14) can 
lead to a completely different outcome, as can be readily 
seen when we compare (24) and (27). 

In terms of parameter identifiability, the fact that one 
of the eigenvalues of (25) is zero implies that that the 
two-dimensional parameter vector of the system is non- 
identifiable. Indeed, the asymptotic normality of the max- 
imum likelihood estimators, [41,42], states that their 
variance (also a lower bound according to the Cramer 
Rao theorem), which determines parameter identifiabil- 
ity of k\ and I<2, is the reciprocal of the eigenvalues of 

(25) . A straightforward calculation involving the eigen- 
vectors of (25) shows that the only identifiable parameter 
is the ratio of the reaction constants appearing in (25). 
Therefore parametric inference for both parameters from 
equilibrium data is not possible. On the other hand, the 
pathwise FIM (26) is not singular, which readily implies 
that both parameters can be identified through (complete) 
time-series data, provided that k\ ^ 0. Summarizing, 
this birth/death model is an example where equilibrium 
sampling is not enough for the identifiability of all the 
parameters, however, if dynamics data are available and 
are taken into account then all the parameters become 
identifiable as pathwise FIM asserts. 

The p53 gene model 

The p53 gene plays a crucial role for effective tumor sup- 
pression in humans as its universal inactivation in cancer 
cells suggests [28,56,57]. The p53 gene is activated in 
response to DNA damage and gives rise to a negative feed- 
back loop with the oncogene protein Mdm2. Models of 
negative feedback are capable of oscillatory behavior with 
a phase shift between the gene concentrations. Here, we 
perform sensitivity analysis to a simplified reaction net- 
work between three species, p53, Mdm2-precursor and 



Mdm2 introduced in [28]. The model consists of five 
reactions and seven parameters provided in Table 1. The 
nonlinear feedback regulator of p53 through Mdm2 takes 
place in the second reaction while the remaining four 
reactions fall in the special class where each reaction 
depends on one parameter. Due to these mechanisms a 
nontrivial steady state regime exists and can be charac- 
terized by random oscillations, see for instance Figure 2. 
The proposed sensitivity methodology is directly appli- 
cable, and the corresponding pathwise FIM, see (13) and 
Figure 1, consists of 5 diagonal blocks with respective size 
1 x 1, 3 x 3, 1 x 1, 1 x 1, 1 x 1. Furthermore, the sen- 
sitivity analysis of this model has been performed earlier 
in [18] based on a linear noise approximation. Here, we 
present a detailed comparison between the two sensitivity 
analysis methodologies, since the one proposed here does 
not involve any approximation of the stochastic network 
dynamics. 

Figure 2 shows the time-series of the species for the 
parameter values in Table 2. Evidently, oscillatory behav- 
ior is observed at this parameter regime, where persistent 
random oscillations occur, ranging between high and low 
populations. On the other hand, the frequency of the 
oscillations is less variable as it has been already reported 
both experimentally and numerically [28] . Another inter- 
esting observation is that the concentration of p53 species 
usually attains the lower bound of its admissible value 
(populations cannot be negative) which results in stochas- 
tic effects far away from Gaussianity, as can be readily seen 
also in Figure 2. 

Proceeding, we denote by 6 = [b x , a x , aj <} k, b y , a y \ T 
the parameter vector. The numerical estimator for RER as 
well as for the pathwise FIM in the logarithmic scale are 
computed utilizing Algorithm i. Logarithmic sensitivity 
analysis is preferred because the range of the parameters 
values varies by orders of magnitude as can be seen in 
Table 2. The upper plot in Figure 3 shows the RER as a 
function of time for various perturbations. Viewing RER 
as an observable, it is striking the speed of relaxation of the 
estimator. Within two or three oscillation periods, RER 



Table 1 The reaction table with x corresponding to p53, yo 
to Mdm2-precursor while /corresponds to Mdm2 



Event 


Reaction 


Rate 




Rate's derivative 






ai(x) 


= b x 


V0Gi(x) = [1,0, 0,0, 0,0,0] r 


Ri 




o 2 (x) 




V e o 2 {y) = [O l x,xy/(x + k), 
-a k xy/(x + k) 2 ,0, 0, 0] r 


R3 


x -> x + y 0 


o 3 (x) 


= byX 


V e a 3 (x) = [0,0, 0,0, x, 0,0] r 


Ra 


yo^y 


fl 4 (X) 


= o 0 y 0 


V0G 4 (x) = [0,0,0, 0,0, y 0 ,0] r 


Rs 


y -> 0 


o 5 (x) 


= o y y 


V 0 a 5 (x) = [0,0,0,0,0,0,y] r 



The state of the reaction model is defined as x = [ x, y 0 , y ] T while the parameter 
vector is defined as 0 = [ b x , a x , a k , k, b y , oq, a y ] T . 
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Figure 2 Molecule concentration of p53, Mdm2-precursor and Mdm2. Concentration oscillations as well as time delays (phase shifts) between 
the species are present due to the negative feedback loop. Furthermore, the concentration of p53 periodically approaches zero and since negative 
concentrations are not allowed, the stochastic characteristics of p53 are far from Gaussian. 



has been converged to its value even though the three 
species have significant oscillations and stochasticity, as 
Figure 2 shows. A primary reason for the fast relaxation 
is the numerical estimator of RER where the summation 
is over all reactions even though only one reaction takes 
places at each jump (see (18)). Having the important prop- 
erty of fast convergence, global sensitivity analysis, where 
not only a point of the parameter regime but also large 
subsets of the parameter space, can be efficiently per- 
formed, [15]. The lower panel of Figure 3 shows the RER 
when only one of the parameters are perturbed by +10% 
or by -10%. Additionally, the RER computed from the FIM, 
utilizing (5), is also provided. The FIM approximation of 
RER is a second order approximation in terms of \e |, hence 
the computation of FIM is typically enough to fully resolve 
the local sensitivities of a model. Evidently, the most sensi- 
tive parameters here are b x and while the least sensitive 
parameters are a x and k. 

Comparison to the LNA-based sensitivity approach 

In [18], the authors suggested a linear noise approxi- 
mation (LNA) for the stochastic evolution around the 
nonlinear mean-field equation, and based on this approx- 
imation a system of ODEs is derived for the mean and the 
covariance matrix of the approximation process. Since the 
noise of LNA is Gaussian, the mean and the covariance 
matrix contains all necessary information regarding the 
approximate stochastic model. Then, the associated FIM 



Table 2 Parameter values for the p53 model 



Parameter 


b x 


a x 


<*k 


k 


by 


O0 


a y 


Value 


90 


0.002 


1.7 


0.01 


1.1 


0.8 


0.8 



is derived and based on it, the sensitivities for each param- 
eter are computed. Although there are regimes where 
this approximation is applicable (short times, high popu- 
lations, systems with a single steady state, etc.), for sys- 
tems with nontrivial long-time dynamics, e.g. metastable, 
it is not correct as large deviation arguments [52] and 
explicit formulas for escape times [53] show. Similar issues 
with non-gaussianity in the long-time dynamics arise in 
stochastic systems with strongly intermittent (pulse-like) 
or random oscillatory behavior [58]. In the p53 model 
considered in [18] which had the same parameter values as 
here, Figure 2 reveals that the time-series of the p53 popu- 
lations persistently fluctuate between high and low values, 
thus the LNA approximation may not be accurate at least 
when the concentration of the species is very low. 

At first pass, when the parameters are grouped into two 
classes depending on their sensitivities, the two sensitivity 
approaches produce qualitatively similar results. Indeed, 
by visual inspecting the lower plot of Figure 3 in the cur- 
rent publication and Figure three in [18], the (more) sensi- 
tive parameters in both methods are b x , b yi a^, a y while 
the practically insensitive parameters are a xi k. However, 
upon closer inspection, the two methods produce differ- 
ent results. Figure 4 shows the proposed FIM (left) based 
on the exact (without any approximations) pathwise rel- 
ative entropy theory, as well the FIM proposed in [18] 
which is derived from the LNA of the reaction system. The 
results are completely different and the proposed pathwise 
FIM is sparse as expected. A striking difference between 
the two sensitivity approaches is that the sensitivity of 
parameter b x in our proposed method is relatively high 
compared to the other parameters while the sensitivity of 
b x in the LNA-based method is at least one order lower 
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Figure 3 RER in time (upper panel) for the parameter perturbation of b x (blue), k (green) and a y (red) by +10% (i.e., e 0 = 0.1 ) as well as 
RER for various perturbation directions (lower panel) computed either directly (blue and red bars) or based on FIM (green bars). Direction 
ek corresponds to perturbation of parameters- 



compared to the other parameters, see Figure 4 (dark 
blue) and also compare Figure 3 of this publication and 
Figure three in [18]. 

As a means of comparison between the methods, we 
perturb b x as well as b y by the same amount and observe 
the Power Spectral Density (PSD), i.e., the square of 
the absolute value of the Fourier transform of each 
species' time-series. Given the sustained random oscilla- 
tions observed in the p53 model, see Figure 2, the PSD 
is a suitable observable since it identifies the dominant 
periodicities and corresponding amplitudes in station- 
ary time-series, [41]. Using the Pinsker inequality (9) as 
a guideline, we expect that the observable will not be 
sensitive to the least sensitive directions of the FIM, there- 
fore, we focus on the most sensitive directions of the 
FIM identified in Figure 4. Figure 5 shows the averaged 
PSD for the three species of the model for the unper- 
turbed case (black lines), the perturbation of b x only 
(blue lines) as well as the perturbation of b y only (yellow 
lines). One hundred realizations were used for the averag- 
ing procedure while the perturbation strength was +20%. 
The Li-norm, i.e., the integral of the absolute difference, 
between the unperturbed PSD and the ^ -perturbation is 
8.56 • 10 5 while the Li-norm between the unperturbed 
PSD and and the ^-perturbation is 4.32 • 10 5 which is 
about the half value. Hence, the averaged PSD -primarily 



the amplitude of the oscillations- is more sensitive to 
perturbations of b x rather than to perturbations of b y as 
our sensitivity analysis method predicted while the LNA- 
based method suggested the reverse order of sensitivity. 
We note that the choice of the L\ norm for the PSDs 
is justified since it describes the total energy (power) 
of the time-series, when viewed as a signal, [41]. An 
explanation of the performance of the LNA-based sen- 
sitivity analysis stems from the fact that the p53 species 
does not have Gaussian noise when the population is 
close to zero, and which can indeed occur frequently, 
see Figure 2 (blue line). Additionally, notice that both 
b x and b y affect the concentration of p53 explicitly or 
implicitly through the associated reactions thus their sen- 
sitivities are heavily biased due to the wrong statistical 
approximation of the p53 species. Moreover, we note that 
other observables, e.g., the arg max (location of the max- 
imum) of the PSD, can be sensitive to perturbations of 
b y , see Figure 5. This observation is not contradictory 
to the findings of the proposed sensitivity methodology 
for two reasons: first, even though the Pinsker inequality 
(9) points towards the right direction regarding sensitive 
parameters for observables, it is only an upper (possibly 
crude) bound. Second, even though b x is more sensitive 
in absolute value than b y in terms of the proposed path- 
wise FIM, both b x and b y sensitivities have the same order 
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Path-wise FIM LNA-based FIM 




Figure 4 The proposed pathwise FIM (left) based on RER as well as the (scaled) FIM based on LNA computed from the StochSens package 
[59]. Evidently, the proposed method uncouples the parameter correlations since most of the off-diagonal elements are zero. 
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Figure 5 Power Spectral Densities (PSD) of the time-series of the species in the p53 model for the unperturbed parameter regime (black), 
when b x is perturbed by +20% (blue) as well as when b y is perturbed by +20% (yellow). The value and the position of the prominent peak of 
the PSD are related with the amplitude and the frequency of the species oscillations. The visual comparison between the averaged PSDs suggests 
that the spectral properties are more sensitive to b x than to by. 
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of magnitude (see Figure 4) therefore there should exist 
observables which are also sensitive to b y perturbations. 
Finally, for the sake of completeness, we report the observ- 
able values for the insensitive parameters. The Li-norm 
between the unperturbed PSD and the ^-perturbation is 
9.03 • 10 3 which is approximately two orders of magni- 
tude less than the Li-norm for the sensitive parameters. 
Similar results hold for the other insensitive param- 
eter, k. Thus, the outcome of the Pinkser inequality 
(9), i.e., that small RER values imply relatively insensi- 
tive parameters for the observables, is also numerically 
verified. 

Epidermal growth factor receptor model 

The EGFR model is a well-studied system describing sig- 
naling phenomena of (mammalian) cells [29-31]. As its 
name suggests, EGFR regulates cell growth, survival, pro- 
liferation and differentiation and plays a complex and 
crucial role during embryonic development and in tumor 
progression [60,61]. In this paper, we study the reac- 
tion network model for the dynamics of EGFR developed 
by Schoeberl et al. [31] which consists of 94 species 
and 207 reactions. Figure 6 presents the EGFR reac- 
tion network in an abstract level. Initially, the extra- 
cellular binding of EGF with the EGF receptors induce 
receptor dimerization. Then, two principal pathways, 
She-dependent and Sch-independend, are initiated lead- 
ing to activation of Ras-GTP. Subsequently phospho- 
rylation of MEK kinase through the activation of Raf 
kinase occurs leading to the phosphorylation of ERK 
kinase which regulates several proteins and nuclear tran- 
scription factors inside the cell. The detailed graphi- 
cal description of the reaction network can be found 
in the Figures one & two of supplementary informa- 
tion in [31]. For completeness, all the reactions along 
with their rates are provided in the Additional file 3 of 
this publication. 

The propensity functions for the reactions R±, . . . , 
R97> Rioo> • • • > ^207 of the EGFR network are written in the 
general form 

aj(x) = kj(^fy / = 1, . . . , 97, 100, . . . , 207 

(29) 



with the exception of reaction pair R^R^ where their 
propensity functions are governed by the Michaelis- 
Menten kinetics 



aj(x) = V m ^Aj/(K m + x Aj ), j = 98, 99 



(30) 



where x is the current state of the reaction system while Aj 
corresponds to the reacting species. The parameter vector 
contains all the reaction constants, 

6 = [k\ f . . . , 1(97, Vmax> K m , /cioo> • • • > /<207] T > 

with all values provided in the Additional file 3. Due to the 
specific values of the reaction constants as well as the ini- 
tial population of the species (see Table 3), the firing rates 
between reactions differ by many orders of magnitude giv- 
ing rise to a highly stiff network. Therefore, even though 
there are some stochastic implementations, [27], here for 
the purposes of RER and FIM calculations, we adopt 
the mean-field approximation discussed in the acceler- 
ated estimators subsection. We solve the derived system of 
ODEs with Matlabs routine ode 15s and compute the FIM 
at the steady state regime which corresponds to the time 
interval [500,700]. The completion of the internalization 
process needs about 500 seconds. It should be noted here 
that even though the simulation of the EGFR is performed 
utilizing a deterministic approximation model, the com- 
puted pathwise FIM has been derived from the stochastic 
network, i.e., (13). This approximation is expected to be 
valid in the sense of (19) due to the large populations con- 
sidered here. Overall, the computed FIM is a sparse matrix 
and measures efficiently the sensitivities of the stochastic 
model in a gradient-free manner. 

The upper plot of Figure 7 shows the diagonal ele- 
ments of the FIM in descending order computed at the 
steady state regime. We report our results in the for- 
mat of Figure 7 in order to be able to accommodate 
the large number of parameters in the model. The /c-th 
diagonal element of the FIM corresponds to RER where 
the perturbation takes place only to the /c-th parameter 
(see (5)). Figure 7 (upper plot) in conjunction with 
Table SI of the Additional file 4 fully describe the 
(local) sensitivities of the reaction network. Table SI 
in Additional file 4 presents the reaction constants 
ordered from the most sensitive to the least sensitive 
parameter. Moreover, the FIM is diagonal -except a 
small 2x2 block associated with the Michaelis-Menten 
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Table 3 Initial population of the species for the EGFR network 
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reactions- therefore the diagonal elements correspond 
to the eigenvalues of the FIM. The sensitivity analy- 
sis depicted in Figure 7, demonstrates that most model 
parameters allow for a vast range of perturbations with- 
out affecting the dynamics. Furthermore, this robustness 
to variations in most parameters was also reported in the 
original, fully deterministic EGFR model in [31]. This is 
a feature shared by many multi-parameter models in sys- 
tems biology and which is known as "sloppiness" , [38]. 
Our methodology can easily demonstrate such properties 
in stochastic dynamics, as we can readily see in Figure 7, 
even if the models include a large number of parameters. 

The previous discussion refers to the analysis of the 
EGFR model to the steady state regime. On the other 
hand, EGFR is a signaling model whose transient regime, 
in addition to the steady state, is of great interest. As dis- 
cussed in Remark 3, we can justify the application of the 
RER and FIM sensitivity analysis in the transient regime. 



Therefore, we compute the proposed FIM at the time 
interval [0, 10], using (22). The lower plot of Figure 7 
shows the diagonal elements of the pathwise FIM in 
the transient regime while keeping the ordering of the 
parameters unchanged from the upper, steady state plot. 
The parameter sensitivity ordering is completely different 
meaning that the sensitivities are time-dependent in the 
transient regime. For instance, the most sensitive parame- 
ters in the stationary regime correspond to the final prod- 
ucts of the reaction network, however, in the time interval 
[0, 10] these species have not been produced yet resulting 
to insensitive reaction constants. In terms of parameter 
identification and estimation, the time-dependent sensi- 
tivities imply that in order to extract the maximum infor- 
mation content from the experimental data, we have to 
estimate the parameters drawing samples from different 
time intervals. These time intervals should be defined 
based on the respective sensitivity indices and selected 
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Figure 7 Diagonal elements of the FIM computed at the steady state regime (upper plot) and at the transient regime (lower plot). Note 
the changes in sensitivity and consequently the parameter identifiability. The parameter sensitivities differ by orders of magnitude. 
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in order to maximize the identifiability for each set of 
parameters. 

Finally, the Pinsker inequality (9) suggests that insen- 
sitive parameters can be perturbed, even significantly, 
without affecting species concentrations or other observ- 
able. As an illustration of this fact, we present in Figure 8 
the concentrations of various critical species of the EGFR 
model when the 140-th (kes) most sensitive parameter is 
perturbed (see Table SI in Additional file 4). The rate con- 
stant /<65 corresponds to a reaction of the She-dependent 
pathway module. Solid blue lines correspond to the unper- 
turbed parameter case while the dashed red lines corre- 
spond to the perturbed case where the perturbation is a 
multiplication by a factor of ten of /<65. We present the 
total number of (EGF-EGFR*)2 binding species without 
Sch* (top, left panel) and with Sch* (top, middle panel) 
as well as Ras-GTP (top, right panel), total activated Raf 
or total Raf (low, left panel), doubly phosphorated MEK 
or MEK-PP (low, middle panel) and doubly phosphor- 
ated ERK or ERK-PP. These species are important for the 
understanding of the system since the different modules of 
the EGFR reaction network communicate through them 
(see Figure 6). It is evident from Figure 8 that the various 
species concentrations remain unchanged to perturba- 
tions of the insensitive parameter I<e>5 as it was expected 



from (9). Moreover, we notice that although the average 
populations become large, which implies that the maxi- 
mum norm in (9) is also large, we still obtained robust 
results regarding /C65S parameter sensitivity. 

Conclusions 

In this paper, we applied and extended a recently proposed 
parametric sensitivity analysis methodology to complex 
stochastic reaction networks. This sensitivity analysis 
approach is based on the quantification of informa- 
tion loss along different parameter perturbations between 
time-series distributions. This is achieved by employing 
the Relative Entropy Rate, which is directly computable 
from the propensity functions. A key aspect of the method 
is that we can derive rigorously an associated Fisher 
Information Matrix on path-space, which in turn consti- 
tutes a gradient-free approach for parametric sensitivity 
analysis; as such it provides a significant advantage in 
stochastic systems with a large number of parameters. 
We demonstrated that the structure of the pathwise FIM 
revealed hidden parameter interdependencies between 
the reactions. The block-diagonal structure of the FIM 
highlighted the sparsity of the matrix which resulted in 
further improvements in the computational efficiency of 
the proposed method. Therefore, parametric sensitivity 




Time (s) Time (s) Time (s) 

Figure 8 Time-dependent concentration of various species of the EGFR network either for the unperturbed parameter vector (solid blue 
lines) or for the perturbed one (dashed red lines). The 1 40-th most sensitive parameter fes) is ten-fold increased and the species concentrations 
are not affected. For the least sensitive parameters such as k 65l we rigorously know from the Pinsker inequality (9) that they should not alter the 
concentration values or any other observable even when they are heavily perturbed. 
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analysis for high-dimensional stochastic reaction systems 
becomes tractable since it is well-known that in high 
dimensional stochastic systems sensitivity analysis tech- 
niques can involve estimators of very high variance, e.g. 
in finite difference methods and their recently proposed 
variants, which can present an overwhelming computa- 
tional cost. Additionally, we proposed the use of mul- 
tiscale numerical approximations of stochastic reaction 
networks in order to derive efficient statistical estimators 
for the FIM and implemented one such approximation 
(mean-field) in a high-dimensional system. 

The proposed pathwise sensitivity analysis method is 
tested and validated on three biological systems: (a) a sim- 
ple protein production/degradation model where explicit 
solutions are available, (b) the p53 reaction network where 
quasi-steady stochastic oscillations of the concentrations 
are observed and where multiscale stochastic approxi- 
mations break down due to the persistent oscillations 
between low and high populations, and (c) a stochastic 
EGFR model which is an example of a high-dimensional 
reaction network with more than 200 reactions and a cor- 
responding number of parameters. In the EGFR reaction 
network, we combined the proposed pathwise FIM which 
has been derived from the stochastic network and the 
mean-field approximation which is used for the efficient 
estimation of the pathwise FIM. Moreover, our earlier rig- 
orous analysis for the steady state regime [19] suggests 
suitable extensions in the transient regime which were 
tested and validated for the EGFR model. We will present 
the full rigorous theory in an upcoming publication. 

Finally, we note that the relation between RER and 
various observables is not straightforward. However, we 
note that the path distribution contains all information 
regarding the process including the steady state and all 
time-dependent observables: practically, our proposed 
sensitivity analysis represents a "conservative" sensitivity 
estimate in the sense that insensitive directions for the 
relative entropy on path-space will yield insensitive direc- 
tions for every observable. This latter statement can be 
justified mathematically through the Pinsker inequality 
(9) which was tested in the examples considered here. 
Based on these observations, the proposed sensitivity 
analysis methods can be deployed in complementary fash- 
ion with existing sensitivity analysis tools, as it can be 
used to narrow down the most sensitive directions in 
a system. 

Additional files 



Additional file 1 : The detailed derivation of relative entropy rate and 
the associated Fisher information matrix. 

Additional file 2: The calculation of equilibrium and pathwise FIMs 
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